Alleviation of the Fermion-sign problem by optimization of many-body wave functions 
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We present a simple, robust and highly efficient method for optimizing all parameters of many- 
body wave functions in quantum Monte Carlo calculations, applicable to continuum systems and 
lattice models. Based on a strong zero-variance principle, diagonalization of the Hamiltonian matrix 
in the space spanned by the wave function and its derivatives determines the optimal parameters. 
It systematically reduces the fixed-node error, as demonstrated by the calculation of the binding 
energy of the small but challenging C2 molecule to the experimental accuracy of 0.02 eV. 
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Many important problems in computational physics 
and chemistry can be reduced to the computation of the 
dominant eigenvalues of matrices or integral kernels. For 
problems with many degrees of freedom, Monte Carlo ap- 
proaches such as diffusion, reptation and transfer-matrix 
Monte Carlo are among the most accurate and efficient 
methods Their efficiency and in most cases accu- 
racy rely crucially on good approximate eigenstates. For 
Fcrmionic systems, the antisymmetry constraint leads to 
the Fermion-sign problem that forces practical implemen- 
tations to fix the nodes to those of an approximate trial 
wave function. The resulting fixed-node error is the main 
uncontrolled error in quantum Monte Carlo (QMC). De- 
spite great effort aimed at developing algorithms that 
eliminate this error entirely Q, to date there is no gen- 
erally applicable solution, therefore systematic improve- 
ment of the wave function by optimization of an increas- 
ing number of variational parameters is the most practi- 
cal approach for reducing this error. 

In this letter we develop a robust and highly efficient 
method for optimizing all parameters in approximate 
wave functions. Application to the small but challeng- 
ing C2 molecule demonstrates that the method system- 
atically eliminates the fixed-node error. The method is 
based on energy minimization in variational Monte Carlo 
(VMC) and extends the zero-variance linear optimiza- 
tion method Q to nonlinear parameters. The approach 
is simpler than the Newton method Q as it does not re- 
quire second derivatives of the wave function. Moreover, 
in contrast to the perturbative optimization method 0, 
it enables the simultaneous parameter optimization of 
both the Jastrow and the determinantal parts of the wave 
function with equal efficiency. 

Form of wave functions. We employ iV-electron wave 
functions which depend on N opt variational parameters 
collectively denoted by p = (c, a, A) and 3N electronic 
coordinates, R, 



*(p,R) 



JVcSF 

J(a,R) J2 c * c * 

i=l 



A,R) 



(1) 



where J(ot, R) is a Jastrow factor that contains electron- 
nuclear, electron-electron and electron-electron- nuclear 
terms. Each of the -/Vcsf configuration state functions 
(CSFs), Cj(A, R), is a symmetry-adapted linear combi- 
nation of Slater determinants of single-particle orbitals 
(j>ij, (r) which are themselves expanded in terms of iVbasis 

basis functions x^( r ) : 0m ( r ) = E^T* V^( r )- Thc 
variational parameters are the linear CSF coefficients c, 
and thc nonlinear Jastrow parameters a and expansion 
coefficients of the orbitals A. In practice, we optimize 
only Acsf — 1 CSF coefficients (as the normalization 
of the wave function is irrelevant) and a set of non- 
redundant orbital rotation parameters @, @] • 

Optimization of wave functions. We extend the zero- 
variance method of Nightingale and Alaverdian Q for 
linear parameters to nonlinear parameters. At each op- 
timization step, the wave function is expanded to linear- 
order in Ap = p — p° around the current parameters p°, 



#1 



(p,R) = * (R) + ^Ap, * 4 (R), 



(2) 



is the current wave function and 
po are the AT op t derivatives of the 



where ^0 = ^(Po) 

= (dv(p)/d P i) p . 

wave function with respect to the parameters. On an 
infinite Monte Carlo (MC) sample, the parameter varia- 
tions Ap minimizing thc energy calculated with the lin- 
earized wave function of Eq. (f5]) are the lowest eigenvalue 
solution of thc generalized eigenvalue equation 



HAp = ESAp, 



(3) 



where H and S are the Hamiltonian and overlap 
matrices in thc (iV op t + l)-dimcnsional basis formed 
by the current wave function and its derivatives 
{*o,*i,^ 2 ,--- >*JV op J; and Ap = 1. On a finite MC 
sample, following Ref. y, we estimate these matrices by 



Sij 



*0 *0/*2 



(4) 
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where H is the electronic Hamiltonian. Wc employ the 
notation that ( 42^ ) denotes the statistical esti- 

mate of (*i|A|*j)/(* |*o) evaluated using MC config- 
urations sampled from vfrg. The estimator for the matrix 
H in Eq. ([¥]) is nonsymmetric and is not the symmetric 
Hamiltonian matrix that one would obtain by minimiz- 
ing the energy of the finite MC sample. In fact, as shown 
m Ref. 0, it is only this nonsymmetric estimator for H 
that leads to a strong zero-variance principle: the vari- 
ance of the parameter changes Ap in Eq. Q vanishes not 
only in the limit that V&o is the exact eigenstate but even 
in the limit that ^u n with optimized parameters is an 
exact eigenstate. In practice, for the wave functions we 
employ, and optimizing on small MC samples, the asym- 
metric Hamiltonian results in 1 to 2 orders of magnitude 
smaller fluctuations in the parameter values. 

Quite generally, methods that minimize the energy of 
a finite MC sample are stable only if a much larger num- 
ber of MC configurations is employed than is necessary 
for the variance minimization method 0] because the en- 
ergy of a finite sample is not bounded from below but 
the variance is. However, it is possible to devise sim- 
ple modifications of these energy-minimization methods 
that require orders of magnitude fewer MC configura- 
tions. Both the zero-variance linear method employed 
here and the modified Newton method Q are examples 
of such modifications. 

Having obtained the parameter variations Ap by solv- 
ing Eq. ([3]), there is no unique way to update the param- 
eters in the wave function. The simplest procedure of in- 
crementing the current parameters by Ap, po — > po + Ap 
works for the linear parameters but is not guaranteed to 
work for the nonlinear parameters if the linear approxi- 
mation of Eq. ([2]) is not good. A better but more compli- 
cated procedure is to fit the original wave function form 
to the optimal linear combination. We next discuss a 
simple prescription that avoids doing this fit. 

At first, it may appear that nothing can be done to 
make the linear approximation better, but this is in 
fact not the case. One can exploit the freedom of the 
normalization of the wave function to alter the depen- 
dence on the nonlinear parameters as follows. Con- 
sider the differently- normalized wave function ^(p, R) = 
iV(p)^(p,R) such that f(p ,R) = ^(p ,R) = *o(R) 
and N(p) depends only on the nonlinear parameters. 
Then, the derivatives of ^(p) are 



* 4 = *i + N^o, where N, = (dN(p)/d Pl ) l 



(5) 



with Ni = for linear parameters. The first-order ex- 
pansion of ^(p) after optimization is 



flin = *0 + APt 



(6) 



same variational space they must be proportional to each 
other, so Ap are related to Ap by a uniform rescaling 



Ap = 



Ap 



1 - Etr N i&p 



(7) 



Since the rescaling factor can be anywhere between — oo 
and oo the choice of normalization can affect not only the 
magnitude of the parameter changes but even the sign. 

For the nonlinear parameters, we have found that, in 
all cases considered here, a fast and stable optimization is 
achieved if Ni are determined by imposing the condition 
that each derivative \T/i is orthogonal to a linear combina- 
tion of * and * lin , i.e., + (1 - = 0, 

where £ is a constant between and 1, resulting in 

N = ^TTZ , (8) 



£D + (1-0(1 + Er S 0j A Pj ) 



with D 



1 



r, v-^nonlin . 

2 E 7 SojA Pj 



Enonlin a A A 
i.k SjkApjApk, 



Since ^u n and "fun were obtained by optimization in the 



where the sums are over only the nonlinear parameters. 
The simple choice £ = 1 first used by Sorella in the con- 
text of the stochastic reconfiguration method Q leads in 
many cases to good parameter variations, but in some 
cases can result in arbitrarily large parameter variations 
that may or may not be desirable. The safer choice £ = 
minimizes the norm of the linear wave function change 
H^iin — S&o 1 1 m the case that only the nonlinear parame- 
ters are varied, but, it can yield arbitrarily small param- 
eter changes even far from the energy minimum. In con- 
trast, the choice £ = 1/2, imposing l^imll = H^oll, guar- 
antees finite parameter changes, until the energy mini- 
mum is reached. 

If instead of finding the parameter changes from 
Eqs. (p| and (O one expands the Rayleigh quotient 
(*im|-ff|*iin)/(*iin|*im) with £ = 1 to second order in 
Ap one recovers the stochastic reconfiguration with Hes- 
sian acceleration (SRH) method with (3 = of Ref. 
It turns out that the SRH method is much less stable 
and converges more slowly, particularly for large systems 
with many parameters. 

Stabilization of the optimization. When the parameter 
values are far from optimal or if the MC sample used 
to evaluate the H and S matrices is very small, then 
the updated parameters may be worse than the original 
ones. However, it is possible to devise a scheme to stabi- 
lize the method in a manner similar to that used for the 
modified Newton method Stabilization is achieved 
by adding a positive constant, adiag > 0, to the diagonal 
of the Hamiltonian matrix except for the first element: 
Hjj — > Hij + adia g <%(l — Sio). As adi ag becomes larger 
the parameter variations Ap become smaller and rotate 
towards the steepest descent direction. In practice, the 
value of adiag is automatically adjusted at each optimiza- 
tion step. Once the matrices H and S have been com- 
puted, three values of Odi ag differing from each other by 
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FIG. 1: Convergence of the VMC total energy of the C2 
molecule with a HF pseudopotential [ll[ when simultaneously 
optimizing 24 Jastrow, 73 CSF and 174 orbital parameters 
for a truncated CAS(8,14) wave function. The insert is an 
enlargement of the last iterations. 

factors of 10 are used to predict three new wave functions. 
A short MC run is then performed using correlated sam- 
pling to compute energy differences of these wave func- 
tions more accurately than the energy itself. The optimal 
value of adiag is then calculated by parabolic interpola- 
tion on these three energies, with some bounds imposed. 

Results. We demonstrate the performance of our op- 
timization method on the ground state of the C2 
molecule at the experimental equilibrium interatomic 
distance of 2.3481 Bohr, employing a scalar-relativistic 
Hartree-Fock (HF) pseudopotential ll| with a large 
Gaussian polarized valence quintuple-zeta one-electron 
basis (12 s, 10 p and 4 d functions contracted to 5 s, 5 p 
and 4 d functions) to ensure basis-set convergence. The 
quantum chemistry package GAMESS [l2j is used to ob- 
tain multi-configurational self-consistent field (MCSCF) 
wave functions in complete active spaces (CAS) gener- 
ated by distributing n valence electrons in m orbitals 
[CAS (n, m)]. We also employ restricted active space 
RAS(n,m) wave functions consisting of a truncation of 
the CAS(n,m) wave functions to quadruple excitations. 
By applying a variable cutoff on the CSF coefficients, 
only the Slater determinants with the largest CSF coef- 
ficients are retained in these wave functions, which are 
then multiplied by a Jastrow factor with essentially all 
free parameters chosen to be zero to form our starting 
trial wave functions. 

Fig. Q] illustrates the convergence of the VMC en- 
ergy during the simultaneous optimization of 24 Jas- 
trow, 73 CSF and 174 orbital parameters for a truncated 
CAS(8,14) wave function. The energy converges to an 
accuracy of about 10 -3 Hartree in 5 iterations, making 
it the most rapidly convergent method for optimizing all 
the parameters in Jastrow-Slater wave functions. 

Fig. H shows the total VMC and DMC energies of 
the C2 molecule as a function of the number of deter- 
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FIG. 2: (Color online) VMC and DMC total energies of the 
C2 and Si2 molecules, versus the number of determinants in 
truncated Jastrow-Slater CAS(8,14) wave functions, for dif- 
ferent levels of optimization. Both VMC and DMC energies 
decrease monotonically if the CSF coefficients are reoptimized 
in VMC but not if only the Jastrow factor is optimized. 

minants retained in truncated Jastrow-Slater CAS(8,14) 
wave functions. For comparison, the convergence of the 
3 E+ ground state energy of Si2 is also shown. For each 
of VMC and DMC, there are three curves. For the up- 
per curve, only the Jastrow parameters are optimized 
and the CSF and orbital coefficients are fixed at their 
MCSCF values. For the middle curve, the Jastrow and 
CSF parameters are optimized simultaneously while, for 
the lower curve, the Jastrow, CSF and orbital parame- 
ters are optimized simultaneously. With fixed CSF coef- 
ficients, the energy does not improve monotonically with 
the number of determinants. In contrast, if the CSF co- 
efficients are reoptimized then of course the VMC energy 
improves monotonically. Remarkably, so does the DMC 
energy, implying that the nodal hypcrsurface of the wave 
function is monotonically improved even though only the 
VMC energy is explicitly optimized. 

The difference in the behaviors of the C2 and Si2 
molecules is striking. While for Si2 the energy shows 
a small gradual decrease upon increasing the number of 
determinants, for C2 there is a very rapid initial decrease, 
a manifestation of the presence of strong energetic near- 
degeneracies among valence orbitals. The fixed-node er- 
ror of a single-determinant wave function using MCSCF 
natural orbitals is about 6 mHa for Si2 and about 46 mHa 
for C2, almost an order of magnitude larger! 

Retaining all the determinants of a CAS(8,14) wave 
function would be costly in QMC but one can estimate 
the energy in this limit by extrapolation. Fig. [3] shows 
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FIG. 3: (Color online) Extrapolation (to 1) by quadratic fits 
of the VMC and DMC energies obtained with the truncated, 
fully reoptimized Jastrow-Slater CAS(8,I4) wave functions 
with respect to the sum of the squares of the MCSCF CSF 
coefficients retained, E2f F (ef CSCF ) 2 . 

TABLE I: Total energy and well depth of the C2 molecule, 
with a HF pseudopotential [TT| | . obtained in DMC with a se- 
ries of fully optimized Jastrow-Slater wave functions. The 
well depth in column 3 is obtained using a well-converged 
DMC atomic energy, -5.43401(6) H, whereas that in column 
4 uses atomic CAS wave functions consistent with the molec- 
ular ones in order to achieve some cancellation of error. 

Wave function Total energy (H) Well depth (eV) 



1 determinant 


-11.0566(3) 


5.13(1) 


5.70(1) 


CAS(8,8) 


-11.0922(3) 


6.10(1) 


6.39(1) 


CAS(8,10) 


-11.0939(3) 


6.15(1) 


6.37(1) 


CAS(8,14) 


-11.0962(3) 


6.21(1) 


6.38(1) 


CAS(8,18) 


-11.0986(3) 


6.28(1) 


6.36(1) 


RAS(8,26) 


-11.1007(3) 


6.33(1) 




Estimated exact" 




6.36(2) 


6.36(2) 



a Scalar-relativistic, valence-corrected estimate of Ref. 13 . 



a quadratic fit of the VMC and DMC energies obtained 
with truncated, fully reoptimized wave functions with re- 
spect to the sum of the squares of the MCSCF CSF co- 
efficients retained, (cf CSCP ) 2 . This sum is equal 
to 1 in the limit that all the CSFs are included. 

Table U reports the extrapolated DMC energies and 
well depths (dissociation energy + zero-point energy) for 
a series of fully optimized Jastrow-Slater CAS and RAS 
wave functions. Increasing the size of the active space 
results in a monotonic improvement of the total energy 
and the well depth in column 3 obtained using a good 
estimate of the exact atomic energy from a DMC calcu- 
lation with a CAS(4,13) wave function. Chemical accu- 
racy (1 kcal/mol = 0.04 eV) is reached for the largest 
active space. Alternatively, as shown in column 4, good 
well depths can be obtained using much smaller active 
spaces by relying on a partial cancellation of error em- 
ploying atomic wave functions consistent with the molec- 
ular ones: for the molecular single determinant wave 
function, an atomic single determinant wave function is 



also used; for the molecular CAS(8,m) wave functions, 
atomic CAS(4,m/2) wave functions are used. In this 
case, while the use of a single-determinant wave function 
yields an error of 0.66 eV, all the CAS wave functions 
yield well-depths that agree with the exact one to bet- 
ter than chemical accuracy. This behavior parallels the 
standard quantum chemistry approaches where single- 
determinant reference methods such as coupled cluster 
are in error by as much as 0.2 eV while multi-reference 
configuration interaction calculations yield chemical ac- 
curacy [13}. Density functional theory methods on the 
other hand perform rather poorly, giving a well depth of 
6.69 and 4.69 eV within LDA and B3LYP, respectively. 

Conclusions. The method presented provides a system- 
atic means of eliminating the main limitation of present- 
day QMC calculations, namely the fixed-node error, and 
supercedes variance minimization Q as the method of 
choice for optimizing many body wave functions. Exten- 
sion of the method to geometry optimization will over- 
come the other major limitation of QMC methods. 
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